21 research outputs found

    Affine Invariant Covariance Estimation for Heavy-Tailed Distributions

    Get PDF
    In this work we provide an estimator for the covariance matrix of a heavy-tailed multivariate distributionWe prove that the proposed estimator S^\widehat{\mathbf{S}} admits an \textit{affine-invariant} bound of the form (1−Δ)S≌S^≌(1+Δ)S(1-\varepsilon) \mathbf{S} \preccurlyeq \widehat{\mathbf{S}} \preccurlyeq (1+\varepsilon) \mathbf{S}in high probability, where S\mathbf{S} is the unknown covariance matrix, and ≌\preccurlyeq is the positive semidefinite order on symmetric matrices. The result only requires the existence of fourth-order moments, and allows for Δ=O(Îș4dlog⁥(d/ÎŽ)/n)\varepsilon = O(\sqrt{\kappa^4 d\log(d/\delta)/n}) where Îș4\kappa^4 is a measure of kurtosis of the distribution, dd is the dimensionality of the space, nn is the sample size, and 1−ή1-\delta is the desired confidence level. More generally, we can allow for regularization with level λ\lambda, then dd gets replaced with the degrees of freedom number. Denoting cond(S)\text{cond}(\mathbf{S}) the condition number of S\mathbf{S}, the computational cost of the novel estimator is O(d2n+d3log⁥(cond(S)))O(d^2 n + d^3\log(\text{cond}(\mathbf{S}))), which is comparable to the cost of the sample covariance estimator in the statistically interesing regime n≄dn \ge d. We consider applications of our estimator to eigenvalue estimation with relative error, and to ridge regression with heavy-tailed random design

    Finite-sample Analysis of M-estimators using Self-concordance

    Get PDF
    We demonstrate how self-concordance of the loss can be exploited to obtain asymptotically optimal rates for M-estimators in finite-sample regimes. We consider two classes of losses: (i) canonically self-concordant losses in the sense of Nesterov and Nemirovski (1994), i.e., with the third derivative bounded with the 3/23/2 power of the second; (ii) pseudo self-concordant losses, for which the power is removed, as introduced by Bach (2010). These classes contain some losses arising in generalized linear models, including logistic regression; in addition, the second class includes some common pseudo-Huber losses. Our results consist in establishing the critical sample size sufficient to reach the asymptotically optimal excess risk for both classes of losses. Denoting dd the parameter dimension, and deffd_{\text{eff}} the effective dimension which takes into account possible model misspecification, we find the critical sample size to be O(deff⋅d)O(d_{\text{eff}} \cdot d) for canonically self-concordant losses, and O(ρ⋅deff⋅d)O(\rho \cdot d_{\text{eff}} \cdot d) for pseudo self-concordant losses, where ρ\rho is the problem-dependent local curvature parameter. In contrast to the existing results, we only impose local assumptions on the data distribution, assuming that the calibrated design, i.e., the design scaled with the square root of the second derivative of the loss, is subgaussian at the best predictor ξ∗\theta_*. Moreover, we obtain the improved bounds on the critical sample size, scaling near-linearly in max⁥(deff,d)\max(d_{\text{eff}},d), under the extra assumption that the calibrated design is subgaussian in the Dikin ellipsoid of ξ∗\theta_*. Motivated by these findings, we construct canonically self-concordant analogues of the Huber and logistic losses with improved statistical properties. Finally, we extend some of these results to ℓ1\ell_1-regularized M-estimators in high dimensions

    Affine Invariant Covariance Estimation for Heavy-Tailed Distributions

    Get PDF
    International audienceIn this work we provide an estimator for the covariance matrix of a heavy-tailed multivariate distributionWe prove that the proposed estimator S^\widehat{\mathbf{S}} admits an \textit{affine-invariant} bound of the form (1−Δ)S≌S^≌(1+Δ)S(1-\varepsilon) \mathbf{S} \preccurlyeq \widehat{\mathbf{S}} \preccurlyeq (1+\varepsilon) \mathbf{S}in high probability, where S\mathbf{S} is the unknown covariance matrix, and ≌\preccurlyeq is the positive semidefinite order on symmetric matrices. The result only requires the existence of fourth-order moments, and allows for Δ=O(Îș4dlog⁥(d/ÎŽ)/n)\varepsilon = O(\sqrt{\kappa^4 d\log(d/\delta)/n}) where Îș4\kappa^4 is a measure of kurtosis of the distribution, dd is the dimensionality of the space, nn is the sample size, and 1−ή1-\delta is the desired confidence level. More generally, we can allow for regularization with level λ\lambda, then dd gets replaced with the degrees of freedom number. Denoting cond(S)\text{cond}(\mathbf{S}) the condition number of S\mathbf{S}, the computational cost of the novel estimator is O(d2n+d3log⁥(cond(S)))O(d^2 n + d^3\log(\text{cond}(\mathbf{S}))), which is comparable to the cost of the sample covariance estimator in the statistically interesing regime n≄dn \ge d. We consider applications of our estimator to eigenvalue estimation with relative error, and to ridge regression with heavy-tailed random design

    Finite-sample analysis of M-estimators using self-concordance

    Get PDF
    The classical asymptotic theory for parametric MM-estimators guarantees that, in the limit of infinite sample size, the excess risk has a chi-square type distribution, even in the misspecified case. We demonstrate how self-concordance of the loss allows to characterize the critical sample size sufficient to guarantee a chi-square type in-probability bound for the excess risk. Specifically, we consider two classes of losses: (i) self-concordant losses in the classical sense of Nesterov and Nemirovski, i.e., whose third derivative is uniformly bounded with the 3/23/2 power of the second derivative; (ii) pseudo self-concordant losses, for which the power is removed. These classes contain losses corresponding to several generalized linear models, including the logistic loss and pseudo-Huber losses. Our basic result under minimal assumptions bounds the critical sample size by O(d⋅deff),O(d \cdot d_{\text{eff}}), where dd the parameter dimension and deffd_{\text{eff}} the effective dimension that accounts for model misspecification. In contrast to the existing results, we only impose local assumptions that concern the population risk minimizer ξ∗\theta_*. Namely, we assume that the calibrated design, i.e., design scaled by the square root of the second derivative of the loss, is subgaussian at ξ∗\theta_*. Besides, for type-ii losses we require boundedness of a certain measure of curvature of the population risk at ξ∗\theta_*.Our improved result bounds the critical sample size from above as O(max⁡{deff,dlog⁡d})O(\max\{d_{\text{eff}}, d \log d\}) under slightly stronger assumptions. Namely, the local assumptions must hold in the neighborhood of ξ∗\theta_* given by the Dikin ellipsoid of the population risk. Interestingly, we find that, for logistic regression with Gaussian design, there is no actual restriction of conditions: the subgaussian parameter and curvature measure remain near-constant over the Dikin ellipsoid. Finally, we extend some of these results to ℓ1\ell_1-penalized estimators in high dimensions

    Adaptive Denoising of Signals with Shift-Invariant Structure

    Full text link
    We study the problem of discrete-time signal denoising, following the line of research initiated by [Nem91] and further developed in [JN09, JN10, HJNO15, OHJN16]. Previous papers considered the following setup: the signal is assumed to admit a convolution-type linear oracle -- an unknown linear estimator in the form of the convolution of the observations with an unknown time-invariant filter with small ℓ2\ell_2-norm. It was shown that such an oracle can be "mimicked" by an efficiently computable non-linear convolution-type estimator, in which the filter minimizes the Fourier-domain ℓ∞\ell_\infty-norm of the residual, regularized by the Fourier-domain ℓ1\ell_1-norm of the filter. Following [OHJN16], here we study an alternative family of estimators, replacing the ℓ∞\ell_\infty-norm of the residual with the ℓ2\ell_2-norm. Such estimators are found to have better statistical properties, in particular, we prove sharp oracle inequalities for their ℓ2\ell_2-loss. Our guarantees require an extra assumption of approximate shift-invariance: the signal must be ϰ\varkappa-close, in ℓ2\ell_2-metric, to some shift-invariant linear subspace with bounded dimension ss. However, this subspace can be completely unknown, and the remainder terms in the oracle inequalities scale at most polynomially with ss and ϰ\varkappa. In conclusion, we show that the new assumption implies the previously considered one, providing explicit constructions of the convolution-type linear oracles with ℓ2\ell_2-norm bounded in terms of parameters ss and ϰ\varkappa
    corecore